Improving robustness of speech recognition performance to aggregate of noises by two-dimensional visualization

نویسندگان

  • Makoto Shozakai
  • Goshu Nagino
چکیده

This paper proposes a new methodology to improve robustness of recognition performance to aggregate of noises by two-dimensional visualization technique. At first, an aggregate of noises existing in adverse environments are collected as much as possible. Then, hidden Markov model (HMM) for each collected noise is trained. Aggregate of the trained HMMs are visualized into two-dimensional map by the statistical multidimensional scaling technique named as COSMOS method. The noises corresponding to the HMMs located in periphery of the map are overlaid to clean speech used for training HMMs of acoustic models. It is revealed that this new methodology significantly reduces recognition error rate by around 60% to non-stationary noises overlaid in the voice interval of word.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Research and Development of Robust Speech Recognition

This paper describes recent research and development activities on robust ASR (automatic speech recognition) in NTT Human Interface Laboratories. ASR system design has been changing from the experimental to the commercial level. A relevant issue in achieving practical ASR is robustness against environmental noise and speaker/circuit differences. Adaptation techniques have been widely investigat...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients

To improve the performance of phoneme based Automatic Speech Recognition (ASR) in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT) coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005